Repetitive Strings are not Context-Free
نویسندگان
چکیده
— Let S be an alphabet. A string of theform xyyz with x, ze£* and y e T* is called répétitive. In this paper we prove that the set of répétitive strings over an alphabet of three or more letters is not context-free, settling a conjecture from [1]. Résumé. — Soit 2 un alphabet. Un mot contient un carré s'il est de la forme xyyz avec x, z e X* et ye£ + . Dans cet article, nous prouvons que V ensemble des mots contenant un carré sur un alphabet à trois lettres n'est pas algébrique, résolvant une conjecture de [1]. Let £ be an alphabet. A string of the form y y with y e Z + is called a répétition. A string which contains a répétition as a substring (i. e. a string of the form xyyz with x, zeX* and yelrt) is called a répétitive string. Interest in répétitive and nonrepetitive strings dates back at least as far as Thue's 1906 paper [13]. One question which has defied an answer for some time is whether or not the set of répétitive strings over an alphabet of three or more characters is context-free. (For binary alphabets the question is trivial.) This question is settled in this paper, confirming a conjecture from [1]. THEOREM. — The set of répétitive strings over an alphabet of three or more characters is not context-free. Classical techniques for showing languages to be not context-free appear to be of no use in proving this resuit, cf. [1, pp. 374-375]. At the same time, intuition very strongly suggests that répétitive strings are not context-free for the same reason that répétitions (i. e. strings of the form ww with w e l + ) are not contextfree: the first-in-last-out nature of a pushdown store does not provide the means (*) Received in October 1980, revised in March 1981. t) Computer Science Department, Washington State University, Pullman, Washington 99164 U.S.A. () Supported in part by N.S.F. Grant No. MCS-80004128. R.A.I.R.O. Informatique théorique/Theoretical Informaties, 0399-0540/1982/ 191/$ 5.00 © AFCET-Bordas-Dunod 192 R. ROSS, K. WINKLMANN to remember one substring and then check it for equality with another substring — the information needed first for such a check is bound to réside at the bottom of the store. Thus, the situation is one where strong intuition does not readily translate into a proof. We regard this as a deficiency in the theory of context-free languages, which we expect to repair by first dealing with a special instance (the noncontext-freeness of répétitive strings) and then, in a subséquent paper, generalizing our proof technique into a new necessary condition for contextfreeness. The remainder of this paper consists of a proof of the above theorem and some concluding remarks. Ends of proofs are marked with the symbol D. We first prove that répétitive strings over a six-letter alphabet are not contextfree. The extension to three-letter alphabets will follow quite easily, using a resuit from [2]. Let .R be the set of répétitive strings over some fixed alphabet containing at least the six symbols a, b, c, S, 0, and 1. For the sake of deriving a contradiction assume that R is context-free. Let M be a nondeterministic pushdown automaton with L(M) = R. Without loss of generality we may assume that M has the following properties: — it accepts by empty store, — it has only one internai state, — it changes its stack height by at most 1 in any single step, and — it reads one input symbol in every step. These properties of M are the result of assuming that the grammar for R is given in "2-Greibach-Normal-Form" [5, 9,11,12], where all productions are of the form A -> a BC, A -• a B, or A -> a with A,B,C being syntactic variables and a a terminal symbol. The standard construction ôf a nondeterministic pushdown automaton from a context-free grammar (see e. g. [8], pp. 115-116) then yields a machine with the above properties. The basic idea behind our proof is to analyze how the pda M can store information about its input. Specifically, we are going to exploit the fact that information received (on the input tape) during an early stage of the computation is bound to réside near the bottom of the stack. Information on the stack simply cannot be arbitrarily juggled around. Some of the technical details of such an analysis are simplified by the assumption that the height of the stack changes by at most 1 in any single move. But while convenient, this assumption is not essential. At the cost of adding some technical detail to the proof we could adopt the weaker assumption that there is some constant k, not necessarily 1, such that R.A.I.R.O. Informatique théorique/Theoretical Informaties REPETITIVE STRINGS ARE NOT CONTEXT-FREE 193 M changes its stack height by at most k in any single step. This weaker assumption corresponds to assuming that the grammar for R is given in Greibach-Normal-Form but not necessarily in 2-Greibach-Normal-Form. Let s dénote the size of the stack alphabet of M. Choose two natural numbers, p and q, such that they satisfy:
منابع مشابه
Finding Syntactic Structures from Human Motion Data
We present a new approach to motion rearrangement that preserves the syntactic structures of an input motion automatically by learning a context-free grammar from the motion data. For grammatical analysis, we reduce an input motion into a string of terminal symbols by segmenting the motion into a series of subsequences, and then associating a group of similar subsequences with the same symbol. ...
متن کاملOn the Estimation of Error-Correcting Parameters
Error-Correcting (EC) techniques allow for coping with divergences in pattern strings with regard to their “standard” form as represented by the languageL accepted by a regular or context-free grammar. There are two main types of EC parsers: minimum-distance and stochastic. The latter apply the maximum likelihood rule: classification into the classes of the strings in L that have the greatest p...
متن کاملAlgorithmics on SLP-compressed strings: A survey
Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straight-line program is a context-free grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal...
متن کاملThe complexity of automatic complexity
We define the automatic complexity A(E) of an equivalence relation E on a finite set S of strings. The minimum complexity is the number of equivalence classes |E|. We prove that the problem to determine whether A(E) = |E| is NP-complete, and the problem to determine whether A(E) = |E| + k for a fixed k ≥ 1 is complete for the second level of the Boolean hierarchy for NP, i.e., BH2-complete. The...
متن کاملFree Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- ITA
دوره 16 شماره
صفحات -
تاریخ انتشار 1982